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Abstract 

A  new  coarse-coding  technique  is  presented  for  labeling  image  pixels  and  re¬ 
gions  to  match  exemplars  or  multivariate  material  signatures.  This  multino¬ 
mial  classification  method  can  be  used  for  object  cuing  and  tracking,  as  well  as 
for  material  identification  and  image  segmentation.  Pixels  are  classified — and 
classification  reliability  can  be  estimated — with  only  single- band  histograms 
and  one  pass  through  each  image  band.  An  example  of  four-cla.ss  labeling 
illustrates  the  power  of  this  two-level  classification  algorithm. 
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1  Introduction 


Image  analysis  xisually  requires  recognition  of  objects— scene  regions  with  character¬ 
istic  colors,  textures,  shapes,  or  motions.  I  have  developed  a  coa7‘se-co(/m(/  method  for 
finding  and  labeling  objects  of  approximately  known  appearance.  The  objects  need  not 
be  homogenous,  complete,  unoccluded,  or  of  any  fixed  size  or  orientation.  Any  spectral 
or  textural  signature  characteristics  that  differ  from  the  background  can  be  exploited. 

My  KNIFE  digital-image-analysis  system  supports  pi.xel  classification  followed  by  re¬ 
gion  extraction,  as  well  as  segmentation  followed  by  region  labeling.  Pixel  classifica¬ 
tion  with  e.xtraction  and  noise  cleaning  is  the  faster  of  the  two,  but  KNIFE’s  integrated 
split/merge  segmentation  [Laws  SSa]  followed  by  region-based  classification  tends  to  give 
better  results.  The  latter  approach  cbuld  e.xploit  region  shape  and  conte.xt  as  well  as 
multivariate  pixel  signatures  [Wesley  86],  although  Only  shape-based  postprocessing  has 
been  implemented  to  date  [Fua  86,  87ab].  This  paper  describes  the  pixel  classification 
technique,  which  is  the  same  whether  it  is  applied  before  or  after  region  e.xtraction. 

I  shall  first  list  tasks  for which  classification  is  useful,  then  discuss  the  required  pro¬ 
cessing  steps.  Briefly  stated,  derived  data  bands  gather  all  relevant  information  into  pixel 
feature  vectors.  Multinomial  signatures  (or  histograms)  are  then  gathered  for  each  class 
prototype.  Reference  signatures  are  compiled  into  labeling  functions  that  map  gray  lev¬ 
els  into  source-class  assignments.  Confusion  matrices  for  the  reference  signatures  help 
combine  single-band  class  estimates  into  multiband  labels.  Connected  components  in  the 
second-stage  label  map  are  then  e.xtracted  as  regions  (or  objects)  for  higher-level  process¬ 
ing. 

2  Labeling  Tasks 

There  are  tasks  in  which  approximate  foreground  and  background  signatures  are  avail¬ 
able,  and  in  which  target  identification  can  be  achieved  by  simple  pi.xel  classification  or 
discriminant  analysis.  Cuing  and  counting  tasks  require  that  all  scene  objects  of  a  certain 
class  be  identified  after  one  or  more  prototypical  examples  have  been  provided  [Touch- 
berry  77;  Conners  82, 84;  Trivedi  S4ab;  Harlow  85;  Laws  85;  Lehrer  87].  Similar  recognition 
problems  are  posed  by  Tomographic  reconstruction  of  solid  objects  from  slices.  Tracking 
is  also  an  important  application:  although  often  limited  to  cuing  and  extraction  of  a  single 
object  in  a  temporal  sequence  of  images,  tracking  of  all  discriminable  objects  in  a  scene 
may  be  necessary  for  robot  vision  or  autonomous  vehicle  navigation  [Ariki  81;  Corkill  82; 
LawS'88b]-  - 

Identification  of  material  types  (soil,  asphalt,  concrete,  water,  etc.)  is  similar,  e.xcept 
that  a  stored  database  of  signatures  is  used  [Ilaralick  69,  74;  Wacker  69;  Kettig  76;  Na- 
gao  76;  Wiersma  76;  Narendra  77;  Peich  77,  80;  Parikh  78;  Sadjadi  79;  Matsumbto  81; 
Swain  81].  Since  simple  classification  is  seldom  adequate,  researchers  have  developed 
multistage  or  relaxatibh  analyses  that  exploit  spatial  and  semantic  relationships  among 
regions  [Duda;  70,  SO;  Milner  70;  Bajcsy  73,  76;  Yakimovsky  74;  BarrOw  76,  77;  Bullock  76; 
Faugeras  79,  81,  82;  Haralick  79;  Price  79,  81,  82;  Ohta  80;  Parina  80;  Browse  82;  Gold¬ 
berg  82,  83;  Hwang  83,  85;  Kitchen  84;  Matsuyama  85;  Belknap  86;  Wesley  86;  Bhami  87]. 

Material  labeling  remains  difficult,  especially  for  uncalibrated  imagery  and  rapidly 
changing  scenes.  Cluster  analysis  arid  spatial  reasoning  can  sometimes  extract  objects, 
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but  classification  tecluiiques  are  still  necessary  for  their  identification.  The  same  is  true  for 
regions  found  by  other  segmentation  methods  [Laws  88a].  I  have  developed  classification 
tools  in  the  KNIFE  package  to  perform  this  identificatory  step  by  using  any  data  bands 
and  object  signatures  that  are  available. 

Many  advances  have  been  made  since  the  early  days  of  pixel  classification  and  crop 
acreage  estimation  in  ERTS/Landsat  imagery.  We  can  now  take  advantage  of  better 
sensors  (far  superior  to  human  vision),  faster  computers,  and  improved  techniques  of 
restoration,  filtering,  interpolation,  enhancement,  correlation  matching,  multivariate  clas¬ 
sification,  range  estimation,  and  shape  from  shading. 

Nevertheless,  progress  in  automated  material  and  object  identification  has  been  marginal 
for  aerial  reconnaisance  and  almost  nonexistent  for  low-angle  or  ground-based  imagery  (as 
is  needed  by  an  autonomous  vehicle).  Edge  detection  and  shape  analysis  are  useful  in  con¬ 
strained  industrial-inspection  tasks,  but  have  had  very  limited  success  in  natural  imagery. 
Segmentation  techniques,  even  those  specially  designed  for  texture  segmentation,  are  only 
beginning  to  approach  human  performance.  Integrated  approaches  that  utilize  edge-based 
and  area-based  techniques  in  a  pyramid  of  gradually  improving  image  resolutions  are  still 
highly  e.\perimental,  and  artificial  intelligence  methods  remain  inapplicable  until  semantic 
features  can  be  e.xtracted  more  reliably. 

In  this  paper  I  show  the  benefits  of  returning  to  pixel  classification  as  an  initial  method 
of  image  partitioning  and  material  identification. 

3  Information  Gathering 

I  assume  that  the  initial  information  relating  to  a  pi.xei’s  material  type  can  be  gathered 
into  a  vector  of  scalars  stored  (implicitly  or  explicitly)  as  the  pi.xel’s  multivariate  value. 
Intensity,  color,  infrared  brightness,  and  radar  reflectance  are  often  available  in  this  form, 
while  many  other  point  properties  may  be  directly  measurable  in  the  industrial  or  medical 
domains.  Derived  data,  such  as  hue,  saturation,  local  texture,  surface  slope,  albedo,  and 
even  optic  flow,  can  also  be  associated  with  individual  pi.xels.  Similar  treatment  of  region 
shape  and  semantic  context  may  be  possible,  as  described  below. 

I  further  assume  that  any  available  “normalizing”  bands — such  as  range  and  surface 
slope — have  been  used  to  correct  the  other  multivariate  values,  thus  making  position- 
independent  material  classification  a  reasonable  approach.  (The  normalizing  bands  may 
also  be  useful  during  segmentation;  however,  they  do  generally  e.\hibit  smooth  variations 
that  are  difficult  to  e.xploit.)  In  some  cases  it  may  be  desirable  to  perform  a  preliminary 
segmentation,  compensate  for  regional  shading  and  hypothesized  object  properties,  and 
then  reanalyze  certain  image  regions  by  using  the  methods  described  in  this  paper. 

The  important  point  is  that  all  information  relating  to  a  pixel’s  scene  label  must  be 
available  as  part  of  the  descriptive  vector.  Useful  properties  of  each  neighborhood,  such 
as  local  texture,  should  be  computed  and  assigned  to  the  central  pixel.  Classification  can 
then  be  done  without  considering  joint  probability  distributions  over  neighboring  pixels. 

I  am  thus  advocating  the  traditional  feature  extraction/classification  paradigm,  except 
that  I  employ  new  techniques  of  classification  and  spatial  analysis.  I  also  want  to  allow  for 
different  descriptive  vectors  in  different  image  regions  after  the  image  has  been  partially 
segmented,  as  well  as  different  vectors  for  different  analysis  tasks,  n 

An  unlimited  number  of  texture  measures  are  available  (local  gray-level  statistics 
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[Piecli  70;  Haralick  73;  Nagao  76];  edge  properties  [llosenfeld  71;  Pietikainen  82;  Kjeli  84]; 
spot  density  [Zucker  75;  Mitchell  77,  78,  79];  co-occurrence  of  texture  elements  [Davis  79, 
81;  Dyer  SO;  Hong  80;  Terzopoulos  80;  Voorhees  87ab];  texture  energy  [Laws  80];  fractal 
dimension  [Pentland  84];  and  others),  each  computed  over  a  range  of  neighborhood  sizes 
and  shapes.  My  current  practice  is  to  precompute  only  local  variance, then  to  compute 
more  complex  texture  measures  [Laws  S8b]  over  highly  textured  areas  if  they  must  be  fur¬ 
ther  classified  or  segmented.  Partitioning  based  on  te.xture  measures  alone  is  very  seldom 
required  in  natural  imagery  because  regions  that  differ  in  texture  will  typically  also  differ 
in  brightness,  color,  or  variance.  Computing  te.xture  measures  after  segmentation,  when 
feasible,  greatly  reduces  problems  caused  by  regional  edge  effects. 

Actually,  any  quantity  clbsely  related  to  local  variance  will  work  for  classification  and 
segmentation.  I  use  the  logarithm  of  local  variance  because  it  assigns  a  reasonable  amount 
of  weight  to  subtle  variations,  is  relatively  unaffected  by  illumination  changes,  and  fits 
well  within  an  eight-bit  pixel  descriptor.  I  compute  the  variance  in  small  windows,  using 
binomial  (or  appro.ximate  Gaussian)  relative  weighting  patterns  such  as 

1  4  6  4  1- 

4  16  24  16  4 

6  24  .36  24  6 

4  16  24  16  4 

1  4  6  4  1  . 

to  emphasize  the  central  pixels.  (Rectangular  windows  with  elliptical  or  diagonal  weight¬ 
ing  patterns  could  also  be  used  [Laws  88b].)  Note  that  the  weights  fall  off  rapidly,  giving 
the  effect  of  even  smaller  measurement  windows  and  helping  to  avoid  regional  border 
effects.  Small  variance  operators  are  essential  for  identifying  such  details  as  fine  tree 
branches,  although  large  operators  (or  small  operators  applied  to  reduced  images)  may  be 
more  convenient  for  extracting  whole  trees. 

Early  researchers  overlooked  the  power  of  such  local  data  for  classification  of  natural 
textures.  Gray-level  co-occurence  statistics  [Haralick  71,  73,  74]  were  found  to  be  more 
powerful  than  Fourier  measures  [Weszka  75,  76;  Dyer  76],  especially  for  nearest-neighbor 
statistics,  but  the  comparative  studies  did  not  make  it  clear  that  the  most  powerful  Fourier 
features  were  also  high-frequency  measures.  These  were  computed  across  large  ima:ge 
windows  and  their  coefficents  were  averaged  over  large  regions  in  the  Fourier  domain;^ 
although  this  reduced  their  power  considerably,  they  still  outperformed  the  low-frequency 
features.' 

Local  measures  generate  bimodal  histograms  when  computed  over  large-scale  macro¬ 
textures:  one  peak  for  te.xture-element  interiors  and  another  for  their  borders.  Although 
this  created  problems  for  Gaussian-based  discriminant  functions,  it  can  be  an  advantage 
for  a  multinomial  classifier.  The  KNIFE  algorithm  makes  use  of  histogram  shape  rather 
than  just  mean  and  standard  deviation. 

The  human  eye  is  very  sensitive  to  collinear  edge  alignments,  even  over  large  distances^ — 
something  no  local  texture  measure  can  capture.  It  would  be  useful  if  we  could  add  mea- 

*  All  oft-cited  study  of  Fourier  phase  measures  [Ekiundh  79]  was  likewise  flawed.  It  should  be  repeated  by 
using  Gabor  fdters  to  measure  local  phase  relationships  among  individual  Fourier  frequencies  [Laws  88b]. 
Such  measure  would  be  appropriate  for  recognizing  blurred  or  noisy  textures  with  unreliable  nearest- 
neighbor  statistics. 
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sures  of  such  gestalt  properties  to  each  pixel  in  an  image.  Characteristics  of  local  shape 
enufromnents  could  also  be  exploited  for  cuing,  counting,  and  tracking  tasks  requiring 
that  objects  of  a  particular  shape  be  found.  We  might  note  any  nearby  discontinuities 
and  surface  maxima,  then  use  cla,ssification  to  seek  pi.xels  with  similar  local  contexts.  Par¬ 
allel  hardware,  such  as  the  Connection  Machine,  might  be  ideal  for  broadcasting  local 
feature  positions  and  compiling  the  shape-environment  descriptors  [Zucker  78;  Davis  79, 
SI;  Dyer  80;  Hong  SO;  Hillis  86]. 

Other  contextual  knowledge  may  be  provided  by  distant  regions,  previous  scenes,  dy¬ 
namic  analysis  goals,  hypothesized  interpretations,  etc.  Capturing  such  knowledge  in  a 
finite  vector  of  pixel  descriptors  would  be  difficult,  but  there  is  a  shortcut  that  is  adequate 
for  our  purpose.  We  may  be  able  to  capture  the  combined  effect  of  all  such  knowledge  on 
a  specific  classification  problem.  It  is  as  if  we  were  asking  an  expert,  or  expert  system, 
“Given  all  you  know  about  this  pixel  and  its  environment,  what  are  the  relative  likeli¬ 
hoods  that  it  came  from  each  material  type?”  Answers  to  this  implicit  question  (from  any 
number  of  evaluation  functions)  can  then  be  combined  with  other  descriptors  to  estimate 
material-class  likelihood.  This  is  similar  to  the  approach  used  in  the  PROSPECTOR  ex¬ 
pert  system  [Hart  77;  Duda  79;  Reboh  81]  for  predicting  mineral  deposits  at  each  position 
on  a  map. 

This  insight  is  the  basis  of  my  classifier.  Each  descriptor  value  gives  evidence  for  its 
pixel’s  material  class  or  object  identity.  Appropriate  evaluation  routines  may  examine 
the  patterns  of  evidence  and  post  their  own  opinions,  in  the  manner  of  blackboard  expert 
systems.  A  top-level  evaluator  then  examines  all  of  the  evidence  and  makes  the  final 
judgment.  Course-coding  means  that  class  membership  is  determined  from  the  pattern  of 
evidence  rather  than  by  majority  vote  or  by  selection  of  a  single  most-reliable  estimator. 
It  would  be  possible,  for  instance,  for  the  top-level  evaluator  to  reject  all  lower-level 
judgments,  as  when  it  assigns  pi.xels  with  “grass”  and  “soil”  characteristics  to  a  “pasture” 
category. 

The  classification  method  described  below  is  quite  tolerant  of  “garbage”  data  bands, 
but  there  are  computational  advantages  to  using  only  the  bands  that  carry  information  for 
a  given  task.  A  useful  method  of  band  selection  is  to  attempt  traditional  or  classiflcatory 
segmentation  of  each  single  data  band,  keeping  only  those  that  produce  reasonable  par¬ 
titions.  (Useless  bands  are  either  unsegmentable  or  result  in  randomly  interspersed  pixel 
labels.  For  some  tasks  we  can  compute  the  accuracy  over  known  training  regions — or 
the  degree  of  correlation  with  a  reliable  classification  method— -as  a  screening  measure.) 
During  interactive  analysis,  a  user  typically  examines  the  data  bands  or  their  segmenta¬ 
tion/classification  results  interactively  to  decide  which  contain  useful  information.  Stan¬ 
dard  procedures  develop  quickly  for  any  specific  task. 

The  bands  remaining  after  such  screening  provide  a  vector  of  numbers  (or  other  codes) 
characterizing  each  pixel  and  its  environment.  Some  of  the  descriptors  measure  inherent 
object  properties,  while  others  may  be  derived  from  sophisticated  processing  of  the  sur¬ 
rounding  image  data.  The  data  vectors  in  a  region  will  all  have  the  same  structure,  so 
that  the  elements  form  two-dimensional  image  bands.  (Some  of  the  data  may  bemissing, 
as  when  specular  reflection  prevents  measurement.  Special  placeholder  codes  sliouhl  mark 

^The  term  comes  from  the  field  of  neural  networks,  or  parallel  distributed  processing  [Hinton  86]. 
Related  multistage  models,  such  as  Samiiel’s  signature  tables  [Samuel  67;  Thosar  73],  have  been  around 
for  a  long  time. 
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this  fact.)  If  sufficiently  refined  measures  were  available,  the  classification  task  would  now 
be  trivial.  The  rest  of  this  paper  discusses  the  more  typical  case  in  which  sophisticated 
classification  and  grouping  techniques  must  still  be  employed. 

4  Prototype  Representation 

Once  data  bands  have  been  computed,  we  can  extract  training  exemplars  for  com¬ 
pilation  into  labeling  functions.  The  user  might,  for  instance,  outline  a  few  regions  in¬ 
teractively  (or  point  to  regions  of  a  segmented  image)  and  supply  labels  for  them.  All 
representative  appearances  of  a  material  type  or  semantic  object  should  be  included  in 
the  training  set  so  that  nearest-neighbor  classification  can  be  used.  (Such  approaches, 
also  referred  to  as  memory-based  rensonmr/ [Stanfill  86;  Waltz  87],  permit  classification 
decisions  to  be  “e.xplained”  to  the  user  by  displaying  the  appropriate  prototypes.  Incor¬ 
rect  assignments  can  be  remedied  by  including  unrecognized  regions  as  new  prototypes. 
The  coarse-coded  classification  proposed  in  this  paper  is  not  quite  a  nearest-neighbor 
technique,  but  the  philosophy  is  similar.) 

A  material  type  can  be  characterized  by  its  signature,  or  probability  distribution,  over 
the  possible  multivariate  pixel  values.  Any  one  material  type  may  have  several  signatures 
(depending,  for  instance,  on  illumination  or  scene  distance);  for  simplicity,  I  shall  treat 
these  as  separate  classes  that  happen  to  share  a  single  semantic  label.  Different  material 
types  (e.g.,  different  types  of  vegetation)  may  be  mapped  to  different  labels  for  some 
purposes  and  to  a  single  label  for  others.  Pixels  will  be  classified  under  the  distinct 
signature  classes,  then  grouped  into  regions  according  to  the  associated  semantic  labels. 

Signatures  have  commonly  been  represented  by  Gaussian  distributions  in  order  to  re¬ 
duce  the  probability  estimates  to  a  manageable  number  of  parameters.^  Such  parametric 
distributions  yield  elegant  discriminant  functions,  but  seldom  model  image  data  realisti¬ 
cally.  Consider  the  trivial  task  of  discriminating  a  two- valued  salt-and-pepper  distribution 
from  a  Gaussian  with  the  same  mean,  standard  deviation,  and  [zero]  skewness.  The  two 
signatures  differ  only  in  their  fourth  and  higher-order  moments  and  cannot  be  separated 
by  quadratic  discriminant  analysis,  yet  alinost  perfect  pi.xel  classification  can  be  achieved 
with  other  techniques  (including  human  vision). 

The  traditional  approach  can  be  salvaged  if  signatures  can  be  decomposed  into  sums 
of  multivariate  Gaussian  distributions.  Pixels  can  then  be  assigned  to  the  subpopula¬ 
tions  and  hence  to  an  overall  class  [S clove  80].  Other  parametric  mixture  densities  cnxi  h& 
handled  similarly.  Unfortunately,  this  decomposition  is  quite  difficult  for  naturally  occur¬ 
ring  material  signatures — even  in  the  one-dimensional  case.  Signatures  of  simple  material 
types  may  be  quite  irregular  (especially  after  multiband  transformation  [Kender  76,  77]), 
whereas  histograms  of  mixed  terrain  and  vegetation  may  so  closely  approximate  a  broad 
Gaussian  as  to  defy  meaningful  decomposition.  We  may  also  have  to  deal  with  nominal, 
ordinal,  or  nonnumeric  band  codes  for  which  parametric  methods  are  inappropriate. 

Traditional  multivariate  classification  compares  each  pi.xel  vector  with  ea,ch  material 
signature  and  assigns  the  pixel  to  the  most  similar  (or  least  distant)  class.  Similarity 
metrics  have  been  based  on  the  probability  of  an  observed  pixel  value,  given  the  material 

Even  a  three-dimensional  signature  of  256  gray  levels  per  dimension  would  be  awkward  to  represent 
as  a  multivariate  histogram.  We  may  therefore  have  to  deal  with  scores  of  signatures  having  a  dozen  or 
more  data  dimensions. 
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class,  or,  via  Bayes’  theorem,  oiv  the  probability  of  a  material  class,  given  the  observed 
value.  The  latter  is  preferable,  but  requires  estimates  of  a.  priori  class  probabilities.  (Any 
classification  technique  assumes  some  model  of  these  a  priori  probabilities;  it  is  the  explicit 
treatment  of  them  that  makes  Bayesian  classification  preferable  when  it  can  be  used.  Hu¬ 
man  judgment,  however,  often  deviates  from  the  Bayesian  model  even  when  the  necessary 
information  is  available.) 

I  propose  the  following  approach.  Suppose  we  accept  observed  histograms  as  the  best 
available  estimates  of  material  and  object  signatures.  Histogram  prototypes  are  equivalent 
to  modeling  each  texture  class  as  a  multinomial  process  with  an  [almost]  independent 
probability  of  producing  each  possible  multivariate  value.  We  can  use  ratios  of  matching 
pi.xel  probabilities  for  two  source  classes  to  estimate  whether  a  particular  descriptor  vector 
is  more  likely  from  one  signature  class  than  from  another  [Laws  85]. 

Smoothing  the  histograms  of  continuous  (i.e.,  interval  or  ratio)  numeric  measures  intro¬ 
duces  a  desirable  correlation  between  nearby  bin  values."'  A  variety  of  smoothing  kemals 
has  been  used  for  estimating  probability  distributions  from  histograms.  I  use  Gaussian 
smoothing  (and  folding  back  of  off-scale  energy)  with  good  results,  but  almost  any  mod¬ 
erate  smoothing  process  would  be  acceptable.  The  Optimal  amount  of  smoothing  depends 
on  the  e.xpected  variability  of  observed  gray  levels. 

Multidimensional  histograms  are  awkward  to  use,  especially  when  different  subsets 
of  the  data  bauds  are  to  be  used  for  different  tasks.  Sparse  storage  techniques  do  exist 
[O’Rourke  84],  but  histogram  smoothing  degrades  their  effectiveness.  I  have  chosen  to 
store  only  the  single-band  (or  mGrpirmf)  histograms,  which  are  easy  to  compute,  manip¬ 
ulate,  display,  and  interpret.  If  the  univariate  histograms  are  inadequate,  multivariate 
transformations  can  be  employed  to  compute  additional  pixel  descriptors  that  summarize 
the  multiband  information  with  respect  to  a  particular  goal.  When,  for  instance,  color 
images  are  segmented,  an  intensity-hue-saturation  representation  is  sufficiently  decoupled 
that  original  red-green-blue  measures  may  usually  be  discarded  [Laws  88a].  This  is  similar 
to  using  redundant  multivariate  transformations  to  search  for  axes  along  which  multiband 
distributions  are  separable  [Ohlander  78]. 

Note  that  I  am  not  proposing  univariate  hi-stogram  representations  as  an  approximation 
to  multidimensional  parametric  signatures.®  Pa-rametric  representations  simply  do  not 
capture  the  full  complexity  of  multimodal  real- world  data.  Multivariate  histograms  do 
capture  this  complexity— but  only  too  well,  which  is  why  smoothing  is  required.  Carefully 
chosen  univariate  histograms  not  only  capture  most  of  the  signature  information,  but  also 
simplify  the  problem  of  combining  data  bands  to  compute  similarity  or  distance  functions. 
I  propose  that  the  weighting  or  feature  extraction  problem  be  confronted  at  this  point 
because  the  resulting  data  bands  and  univariate  histograms  are  in  a  form  suitable  for 
human  understanding.  This  permits  an  “expert  system”  approach  to  system  development, 
as  weU  as  providing  very  fast  techniques  for  image  segmentation  and  object  identification. 

^Smoothing  does  lose  information  when  gray  levels  from  one  class  are  interdigitated  with  gray  levels 
from  another.  Such  interleaved  picket  fence  effects  sometimes  occur  in  derived  data  bands  when  two 
distinct  populations  are  mapped  to  a  single  interval. 

^I  dd  take  such  a  stand  elsewhere  [Laws  88a],  viewing  segmentation  tiiat  uses  univariate  histograms  as 
a  heuristic  shortcut  to  full  multidimensional  cluster  analysis. 
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5  Coarse  Coding 


Even  a  classification  procedure  that  employs  just  tlie  marginal  histograms  presents 
some  data-handling  difficulties.  Assume  that  we  have  as  many  as  sixteen  data  bands  and 
128  sixteen-band  signatures  (representing  128  source  classes  or  possible  appearances  of 
materials).  It  would  be  inefficient  to  store  128  class  likelihoods  for  each  pixel,  updating 
each  sixteen  times  as  the  data  bands  are  processed.  Performing  multiband  classification 
pixel  by  pixel  would  seem  more  reasonable,  but  requires  moving  all  128  sixteen-band 
signatures  through  the  computer  for  each  pixel  processed.  Special  parallel  hardware Would 
be  needed  to  make  such  approaches  practical. 

Fortunately  there  is  an  efficient  way  to  classify  the  pixels  in  a  single  pass  through 
the  data  bands  and  signature  histograms.  Some  accuracy  may  be  sacrificed,  but  the 
processing  effort  and  intermediate  storage  are  greatly  reduced.  The  key  is  a  coarse-coded 
representation  that  encodes  approximate  likelihoods  for  alT signature  classes  in  a  single 
integer  or  bit  pattern.  The  final  bit  pattern  for  each  pixel  can  then  be  decoded  to  provide 
fairly  reliable  likelihood  estimates  for  each  signature  class.  The  more  data  bands  used, 
the  less  effect  misclassification  on  any  band  will  have.  Adding  more  data  bands  can  only 
improve  the  classification  results  as  long  as  the  stored  signatures  for  those  bands  are  truly 
representative. 

The  essence  of  coarse  coding  is  to  let  each  bit  or  group  of  bits  in  a  pattern  encode 
independent  information  about  the  pi.xel’s  material  class;  The  overall  sequence  of  bits 
is  then  a  more  reliable  indicator  of  material  class  than  are  the  individual  bit  groups 
[Hinton  86].  This  can  be  regarded  as  a  two-step  classification  procedure:  first  each  pi.xel 
vector  is  summarized  by  a  coarse-coded  bit  pattern,  after  which  the  bit  pattern  is  e.\panded 
to  a  vector  of  signature  likelihoods  or  other  outputs.  (It  can  also  be  regarded  as  the  feature 
extraction  and  classification  steps  of  traditional  pattern  recognition,  performed  after  the 
preliminary  feature  extraction  that  generated  the  data  bands.) 

The  quality  of  the  final  classification  obviously  depends  on  the  method  of  encoding 
and  decoding  these  bit  patterns.  Bit  groups  are  similar  to  the  hidden  units  of  connec- 
tionist  pattern  recognition  [Rumelhart  86b].  That  approach  would  use  a  gradient-descent 
algorithm  to  evolve  a  set  of  codes  with  satisfactory  classificatory  power  on  some  training 
set;  with  luck,  the  codes  would  also  generalize  to  additional  classification  problems.  I  have 
developed  a  more  structured  approach  that  uses  image  statistics  and  signature  character¬ 
istics  to  select  the  codes  dynamically  for  each  task.  Each  relevant  data  band  is  reduced 
to  a  few  bits  in  the  code;  the  bit  string  is  then  decoded  in  the  manner  that  best  preserves 
the  discriminability  of  the  reference  signatures. 

I  choose  a  coding  scheme  in  which  each  group  of  descriptor  bits  represents  a  pixel’s 
most  likely  signature  class,  as  estimated  from  a  Single  data  band.  The  single-band  codes 
are  concatenated  to  form  a  full  coarse-coded  pixel  descriptor.  (This  is  ecpiivalent  to  storing 
each  bit  group  in  a  separate  data  band.  Many  operating  systems,  though,  limit  the  number 
of  images  or  data  files  that  can  be  open  at  one  time.)  The  required  number  of  bits  per 
pixel  depends  on  the  number  of  data  bands  used  and  the  degree  of  accuracy  that  each 
requires.  Seven  bits  per  band,  for  instance,  are  sufficient  to  designate  one  of  128  signature 
classes  uniquely. 

Alternatively,  the  available  bits  per  pixel  can  be  partitioned  optimally  among  the  data 
bands.  Source  classes  can  be  clustered  beforehand  into  groups  with  similar  signatures  for 


a  particular  band;  only  enough  bits  to  represent  the  equivalence  sets  are  then  needed. 
Such  partitioning  can  even  increase  overall  classification  accuracy  [Hinton  86].  The  set 
partitioning  should  be  different  for  each  data  band,  with  more  bits  used  for  the  more 
informative  bands.  Signature  clusters  should  ma.\imize  similarity  within  a  cluster  and 
discriminability  among  clusters  while  ma.\imizing  classification  accuracy  across  all  bands. 
It  may  also  be  advantageous  to  group  semantically  related  material  classes,  such  as  all  of 
the  vegetation  signatures,  for  tasks  in  which  such  confusion  is  relatively  unimportant. 

Additional  bits  for  each  band  could  be  allocated  to  record  the  second  (or  even  third) 
best  class,  as  well  as  an  estimate  of  classification  reliability.  As  this  would  make  the 
decoding  more  difficult,  it  is  worthwhile  only  if  one  of  the  data  bands  contains  significant 
information  that  cannot  be  captured  by  the  pattern  across  all  bands.  On  the  other  hand, 
there  little  penalty  is  incurred  for  applying  independent  classification  algorithms  to  one 
data  band  (and  to  the  previously  computed  bit  patterns),  recording  their  opinions  as  if 
additional  data  bands  had  been  employed. 

The  remainder  of  this  paper  treats  only  the  signature  classification  problem  and  not 
the  allocation  of  coarse-coding  bits  or  the  optimization  of  cluster  assignments. 

6  Labeling  Functions  and  Likelihood  Tables 

Given  a  data  band,  we  need  to  transform  the  pi.\el  values  to  bit  codes  that  can  be 
appended  to  the  coarse-coded  descriptor  band.  This  is  just  classification  of  observed  gray 
levels  into  an  a  posteriori  most  likely  signature  class. 

I  start  with  a  set  of  single-band  histograms  representing  important  objects  and  ex¬ 
pected  background  signatures.  Signatures  may  come  from  a  database  (suitably  corrected 
for  scene  and  sensor  characteristics)  or  from  labeled  image  regions.  If  the  backgound 
statistics  are  unknown,  they  can  be  estimated  from  a  full-image  histogram  (if  the  target 
objects  are  small)  or  from  ensemble  statistics  of  typical  backgrounds. 

The  first  step  is  to  smooth  any  continuous-valued  reference  signatures.  This  spreads 
each  bin  probability  over  several  bins  in  a  manner  that  models  the  uncertainties  of  gray- 
level  reproducibility.  Considerable  smoothing  is  needed  for  object  recognition,  much  less 
for  object  tracking  under  uniform  illumination  conditions.  A  certain  minimum  amount 
of  smoothing  is  needed  to  account  for  random  sampling  effects  in  the  original  signature 
histograms  [Laws  85].  : 

The  next  step  is  to  estimate  source  class  likelihoods  for  each  possible  gray  level.  The 
class  code  (i.e.,  bit  pattern)  for  the  most  likely  signature  class  can  then  be  stored  in  a  single¬ 
band  lookup  table  for  rapid  pixel  labeling.  I  call  these  lookup  tables  Uibeling  functions 
to  avoid  confusion  with  other  lookup  tables  described  in  this  paper.  The  mechanics  of 
actual  classification  depend  on  the  available  hardware,  but  lookup  table  transformations 
are  typically  quite  efficient. 

The  selected  source  class  for  a  given  gray  level  could  be  just  the  signature  having  the 
highest  probability  for  that  bin,  but  we  can  use  a  priori  source  probabilities  and  Bayes’ 
rule  (as  well  as  utility  functions  or  error  penalties)  to  make  a  better  selection.  The  prior 
class  probabilities  can  be  estimated  from  historical  frequencies  or  from  an  analysis  of  the 
data  band  histograms  (as  described  below). 

Once  we  have  the  labeling  functions,  we  can  compute  expected  confusion  matrices  (or, 
stated  differently,  a  priori  signature  discriminability).  Because  this  can  be  done  before 
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applying  tlie  labeling  functions  to  the  input  data  bands,  it  can  be  used  for  task-dependent 
band  selection.  The  confusion  matrices  also  provide  probabilities  and  likelihoods  needed 

to  decode  the  coarse-coded  pixel  descriptors. 

The  trick  is  to  pass  each  single-band  reference  signature  through  the  band  labeling 
function.  Some  of  the  gray  levels  recorded  for  that  class  will  be  correctly  labeled,  while 
others  will  be  attributed  to  the  incorrect  signature  classes.  The  relative  frecinencies  of  the 
different  labels,  normalized  to  unit  sum,  indicate  the  probability  of  each  class  assignment, 
given  the  source  class;  this  forms  one  row  of  a  single-band  confusion  matrix.  The  process 
is  repeated  for  each  signature  to  fill  out  a  confusion  matrix  for  each  band. 

Given  an  assigned  label,  we  can  now  use  Bayes’  theorem  to  derive  the  single- band 
posterior  probability  of  each  source  class.  If  the  source  classes  are  equally  likely,  the 
relative  likelihoods  can  be  read  from  the  columns  of  the  appropriate  confusion  matrix.  If 
not,  weighting  factors  proportional  to  source  probabilities  adjust  the  column  entries  to 
yield  the  likelihoods;  normalization  to  unit  sum  converts  these  to  posterior  probabilities. 

One  estimate  of  the  source  probabilities®  can  be  obtained  by  passing  the  image  data 
band  histograms  through  the  labeling  functions,  possibly  combining  the  resulting  label 
frequencies  across  bands.  (These  label  frequencies  can  also  be  used  for  band  selection; 
rates  of  target  detection,  for  instance,  would  seldom  be  improved  by  using  data  bands  in 
which  the  target  label  is  never  assigned.)  Note  that  passing  a  reference  signature  through 
the  labeling  functions  is  generally  much  faster  than  passing  image  data  through  and  then 
histogramming  the  result. 

Combining  all  of  the  above,  we  can  how  get  label  probabilities  for  each  band.  These 
can  be  combined  to  get  an  estimate  of  the  multiband  class  probabilities  for  any  given 
pattern  of  coarse-coded  descriptor  bits.  The  probability  of  any  bit  pattern  for  a  given 
source  class  is  the  product  of  the  probabilities  of  the  individual  single-band  labels.  (I 
assume,  as  discussed  above,  that  the  selected  data  bands  are  sufficiently  independent 
that  we  can  ignore  pairwise  and  higher-order  band  interactions.  If  this  is  not  true,  fuzzy 
combining  functions  might  be  more  appropriate  [Zadeh  74;  Salton  83;  Laws  85].)  We  can 
do  this  multiplication  for  each  of  the  source  classes,  perform  the  Bayes’  inversion  to  get 
the  posterior  class  probabilities,  and  assign  the  most  likely  class  label  to  the  bit  pattern. 

The  labels  for  all  possible  bit  patterns  can  be  precomputed  and  stored  in  ^.  classification 
lookup  table  if  the  number  of  bits  is  small.  Similar  lookup  tables  can  store  the  second-best 
label,  the  ratio  of  best  to  second-best  posterior  proba  bilities,  the  entropy  (i.e.,  information- 
theoretic  uncertainty),  or  any  other  function  of  the  posterior  probabilities. 

Longer  descriptors,  or  those  physically  stored  in  more  than  one  intermedate  band, 
may  be  more  efficiently  decoded  by  cached  lookup.  (I  use  software  lookup  tables  for 
descriptors  of  up  to  12  bits,  dynamic  decoding  with  cached  lookup  for  longer  patterns.) 
Patterns  found  in  the  image  are  decoded  by  using  the  same  formula  as  before  to  select 
the  most  likely  source  class.  The  pattern  and  its  label  are  then  attached  to  a  list  (or  a 
pair  of  corresponding  lists).  Each  computed  pixel  descriptor  is  sought  in  the  cached  list 
and  is  expanded  to  its  vector  of  class  likelihoods  only  if  the  pattern  has  not  been  seen 
previously.  There  are  typically  only  a  few  hundred  distinct  patterns  in  an  image.  Li  near 

These  estimates  are  intermediate  between  a  priori  and  a  posteriori  estimates,  thus  arguably  closer  to 
the  methods  of  human  perception.  Tlie  prior  probability  of  having  a  zebra  in  my  office  is  infinitesimal,  but 
having  seen  a  black-and-white  striped  animate  object  there,  I  should  use  an  increased  “zebra  probability” 
in  trying  to  identify  it. 
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search  that  commences  with  the  most  recently  seen  pattern  (or  the  most  recently  created) 
is  satisfactory,  although  a  hashed  storage  scheme  would  be  faster. 

The  classification  procedure  has  been  described.  Each  whole-image  or  I'egional  data 
band  is  passed  through  its  corresponding  labeling  function  to  produce  a  source  class  esti¬ 
mate  for  each  pixel.  These  estimates,  or  code  bits,  are  appended  to  a  band  of  coarse-coded 
pixel  descriptors.  The  final  coarse-coded  descriptors  are  then  passed  through  the  classifi¬ 
cation  loolvup  table  or  are  dynamically  decoded  to  obtain  a  consensus  label  for  each  pixel. 
These  labels  constitute  si  label  map.  The  next  step  is  to  extract  the  connected  components 
and  instantiate  them  as  regions  in  a  knowledge  base. 

7  Region  Extraction 

•  The  new  region-extraction  algorithm  in  the  KNIFE  package  is  greatly  inaproved  over 
one  reported  earlier  [Laws  82].  An  initial  scan  through  the  label  map  locates  each  con¬ 
nected  component;  a  second  pass  then  renumbers  the  pixels  to  form  a  region  map.  Region 
descriptors  computed  during  this  process  can  be  used  for  identifying  small  noise  regions 
that  should  be  merged  with  neighbors  [Laws  88a]. 

I  use  a  statistical  noise-cleaning  technique.  Each  small  connected  component  is  consid¬ 
ered  for  merger  with  the  neighbor  having  the  most  similar  multivariate  regional  histogram. 
The  test  for  histogram  similarity  allows  for  the  possibility  that  the  small  region’s  histogram 
matches  only  a  portion  (e.g.,  one  tail)  of  its  larger  neighbor’s  histogram.  (Most  statistical 
goodness-of-fit  tests  assume  random  sampling  and  so  require  a  full  match.)  A  pseudo-F 
test  for  collinear  surface  fit  then  determines  whether  the  merge  is  acceptable. 

A  rather  complex  problem  arises  when  hierarchical  signature  classes  are  available. 
Suppose,  for  instance,  that  several  kinds  of  grtiss  are  known  to  the  analysis  system.  All  of 
the  signatures  are  likely  to  be  similar,  even  though  sufficiently  distinct  to  form  separate 
signature  sets.  In  labeling  a  grassy  field,  the  classifier  is  now  likely  to  assign  dilferent 
grass  labels  to  neighboring  pixels.  Where  large  clumps  of  one  type  occur  we  would  like 
the  classifier  to  report  them,  but  where  labels  are  intermixed  we  would  like  the  classifer  to 
group  them  all  under  a  generic  “grass”  label.  Similarly,  interspersed  grass  and  soil  should 
be  labeled  “field.” 

We  cannot  search  for  the  composites,  then  subdivide  them  into  more  specific  signature 
classes:  the  ensemble  signature  for  a  mixture  of  unknown  proportions  is  often unknowable 
or  too  broad  to  be  useful.  A  better  solution  is  to  extract  homogeneous  regions  from  a  fully 
labeled  image,  then  replace  intermi.xed  labels  with  appropriate  generic  ones  and  extract 
any  new  homogeneous  regions.  This  may  have  to  be  repeated  with  several  different  label 
generalizations,  but  the  area  to  be  reprocessed  shrinks  each  time  an  identifiable  region  is 
found.  I  have  worked  out  a  way  to  do  this  during  connected-component  extraction,  but 
have  not  yet  implemented  it  in  the  KNIFE  package. 

The  initial  image  or  image  region  is  thus  partitioned  into  labeled  regions.  Selected 
regions  can  be  further  partitioned,  if  necessary,  either  by  pixel  labeling  and  grouping  or 
by  segmenting  and  then  labeling.  It  is  often  effective  to  alternate  the  two  techniques, 
since  classification  can  break  up  complex  imagery  that  stymies  histogram-based  segmen¬ 
tation,  while  spatial  segmentation  can  identify  subregions  that  match  separate  modes  of 
a  multimodal  signature. 
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8  Example 


Giound  truth  is  difficult  to  obtain  for  natural  imagery.  Rather  than  present  tables  of 
classification  accuracies,  I  am  going  to  offer  a  visual  presentation  of  classification  on  the 
training  set.  Success  at  such  a  task  is  not  sufficient  to  prove  the  usefulness  of  my  image 
analysis  approach,  but  it  is  essential.  A  tracking  or  labeling  system  that  cannot  identify 
its  training  regions  is  hardly  worth  building. 

I  start  with  the  color  image  in  Figure  1(a)  through  (c),  transformed  to  my  VHS 
{vividness-hue-saturation)  representation  [Laws  8Sb].  From  the  vividness  (or  intensity) 
band  I  compute  the  3  x  3  log-variance  baud  in  Figure  1(d).  I  would  prefer  a  texture 
measure  that  responds  less  strongly  to  object  edges,  but  have  not  yet  developed  a  suitable 
normalization. 

I  then  trace  the  four  major  scene  objects— road,  sky,  trees,  and  ground.  I  haven’t 
printed  this  training  image,  but  it  includes  all  image  pixels  except  for  rather  thick  strips 
along  boundaries  of  the  four  regions.  Tracing  with  amouse  takes  only  a  few  seconds  for 
most  large  regions,  although  I  admit  to  taking  a  couple  of  minutes  to  extract  the  grass 
and  mountains  as  a  single  gerrymandered  “ground”  region.  (KNIFE  could  handle  multiple 
e.xemplars  per  semantic  class^  but  its  display  and  editing  tools  for  composite  signatures  are 
rather  primitive.)  Training  signatures  Can  come  from  previous  images  or  from  a  database, 
although  crude  tracing  of  large  regions  is  a  good  strategy  for  acquiring  new  materials. 

Signatures,  or  histograms,  for  the  four  material  classes  are  shown  in  Figure  2.  No  one 
band  is  adequate  for  discriminating  all  four  textures,  but  the  patterns  of  confusion  differ 
from  one  band  to  another.  This  is  critical  if  coarse  coding  is  to  be  effective,  since  I  do  not 
exploit  interband  correlations. 

Figure  3  shows  the' single-band  pixels  labels  computed  with  my  method.  Trees  are 
marked  with  the  darkest  gray  levels,  then  ground,  road,  and  sky.  The  vividness  band 
mislabels  much  of  the  sky,  the  mountain  face,  and  the  road.  Hue  mislabels  much  of  the 
ground  area.  Saturation  labels  almost  everything  tree  or  road,  while  the  XF^measure 
produces  noisy  patches  of  tree  and  sky  labels.  None  of  these  classifiers  can  reconstruct 
the  training  set,  but  at  least  they  make  different  patterns  of  errors,  ^  ^  ^  ^ 

Figure  4(b)  shows  how  this  trait  can  be  e.xploited.  A  second-level  classifier  is  applied 
to  the  four  labels  at  each  pi.xel.  This  operator,  constructed  to  optimize  labeling  of  the 
training  signatures,  “second  guesses’V  the  first-level  classifiers  and  assigns  a  final  pixel 
label.  The  result  is  still  somewhat  noisy,  but  most  pixels  have  been  classified  correctly. 

Since  KNIFE  is  also  a  segmentation  program,  I  can  use  its  connected-component  e.x- 
traction  routine  to  consolidate  labeled  pi.xels  into  regions  and  build  corresponding  data 
structures.  Figure  4(c)  shows  the  extracted  regions  when  KNIFE’s  seglevel  parameter  is 
set  to  1  (the  coarsest  setting  for  normal  use).  Figure  4(d)  goes  one  step  further,  merging 
any  region  smaller  than  200  pixels  into  its  most  similar  neighbor." 

The  final  result  is  a  clean  segmentation  in  about  a  tenth  of  the  time  that  KNIFE’s 
integrated  split/merge  partitioning  algorithm  would  require.  Classification-based  segmen¬ 
tation  of  a  256x256  region  may  take  from  one  to  ten  minutes  on  a  VAX  11/780,  depending 
on  the  number  of  bands  and  the  number  of  regions  formed.  The  illustrated  spectral/spatial 
labeling  process  has  not  only  recovered  its  training  set,  but  has  done  a  good  job  of  labeling 

'  knife’s  region-growing  operator  would  have  much  tlie  same  efTect  if  applied  to  each  of  the  major 
regions. 
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(a)  Vividness  Histograms  (b)  Hue  Histograms 


(c)  Saturation  Histograms  (d)  Log- Variance  Histograms 


Figure  2:  Prototype  Signatures 
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(b)  Hue  Classification 


(a)  Vividness  Classification 


(c)  Saturation  Clsissification 


(d)  Log- Variance  Classification 


Figure  3:  Single-Band  Classification  Maps 


(a)  Vividness  ( V)  Band 


(b)  Multiband  Classification 


the  regional  boundary  pixels  as  well. 


9  Summary 

Many  problems  in  image  analysis  can  be  solved  by  labeling  pixels  and  grouping  them 
into  regions,  or  by  partitioning  pixels  into  regions  and  assigning  labels  based  on  regional 
pixel  properties.  Three  such  problems  are 

•  Identifying  materials  with  known  brightness,  color,  or  texture  distributions 

•  Identifying  multiple  scene  objects  once  some  of  them  have  been  found 

•  Tracking  objects  from  one  image  to  another. 

Pixel  labeling  provides  tentative  regions  for  higher-level  analysis  and  integrates  well  witli 
other  segmentation  methods.  ^ 

The  coarse-coding  method  of  classification  is  fast  and  effective.  It  requires  only  single- 
band  histograms  as  reference  signatures,  one  data  band  for  working  storage,  and  one 
pass  through  each  image  band  to  perform  the  classification.  Use  of  multinomial  statis¬ 
tics  avoids  the  multivariate  Gaussian  assumption  built  into  traditional  classification  ap¬ 
proaches.  Needed  probabilities  can  be  estimated  from  the  reference  signatures  and  data 
bands,  while  missing  data  bands  (in  either  the  signatures  or  in  areas  of  the  image)  can  be 
handled  with  minimal  difficulty.  The  approach  is  fairly  intuitive  and  shares  many  of  the 
benefits  of  blackboard-style  expert-system  development. 
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